1.- Golas
2. - Define learning path
3. - Define goals
4. - Goal could be to learn more
5. - Keep goals in mind
6. - Set short and long term goals
7. - Create incremental plan
8.- Accountability
9. - Build in accountability to meet the goal
10. - Make learning a habit
11. - Remember your larger goal
12. - Build rewards for your learning habit
13.- Learning Strategies
14. - Break code logic into pseudocode
15. - Be patient with yourself
16. - Remember your goals
17.- Technical advice
18. - Coding requires debugging
19. - Seek solutions
20. - Look at successful examples
21. - Build code incrementally
22.- Summary
23. - Set goals and accountability measures
24. - Break down goals into smaller goals
25. - Focus on smaller goals
26. - Work through exercises
27. - Pseudocode first, code last
Completing a Udacity program takes perseverance and dedication, but the rewards outweigh the challenges. Throughout your program, you will develop and demonstrate specific skills that will serve you for a lifetime. Congratulations on taking the first step towards developing the skills you need to power your career through tech education!
The videos, text lessons, and quizzes you encounter in the classroom are optional but recommended. The project at the end of this course will test your ability to apply the skills and strategies you have learned in the classroom to real-world problems. It will also provide tangible outputs you can use to demonstrate your skills for current and future employers.
The project is designed to be challenging. Many students initially struggle, but with a little grit, they are able to learn from their mistakes and build their skills. Data from nearly 100,000 Udacity graduates show that commitment and persistence are the highest predictors of whether or not a student will graduate.
At some point, nearly every student will get stuck on a new concept or skill, and doubt may set in. Don’t panic. Don’t quit. Be patient, and work through the problem. Remember that you are not alone and the problem that you are encountering is likely one that many others have experienced as well. Whether you are stuck or simply looking for encouragement, you’ll find Udacity Mentors and students there to help.
完成一个大胆计划需要毅力和奉献精神,但回报大于挑战。在整个计划中,您将培养和展示为您服务一生的具体技能。祝贺您迈出第一步,通过技术教育发展您的事业所需的技能!
您在课堂上遇到的视频、课文课程和测验是可选的,但建议使用。本课程结束时的项目将测试您将课堂中学到的技能和策略应用于实际问题的能力。它还将提供有形的产出,您可以使用这些产出来为当前和未来的雇主展示您的技能。
该项目的设计具有挑战性。许多学生最初都在挣扎,但稍有勇气,他们就能从错误中吸取教训,并培养自己的技能。来自近10万名Udacity毕业生的数据显示,承诺和坚持是学生能否毕业的最高预测因素。
在某些时候,几乎每个学生都会陷入一个新的概念或技能,怀疑可能会开始。不要惊慌。不要放弃。要有耐心,并努力解决这个问题。请记住,你并不孤单,你遇到的问题很可能是其他许多人也经历过的问题。无论你是被卡住还是只是寻求鼓励,你都会在那里找到大胆的导师和学生来帮忙。
Technical Mentor Help
Udacity Support Community
General Account Help
Waymo 公司
Dragomir Anguelov
What is Waymo?
Wayno 在哪测试车辆?
The Waymo 开放数据集
截至 2021 年 3 月,数据集有两个不同的部分:感知数据和运动数据。我们将在前两门课程中使用感知数据,但不会在课程中使用运动数据,因为它是在相关课程投入生产后发布的。
Waymo 开放数据集挑战赛
Waymo 开放数据集中的对象
车辆,行人
为什么 Waymo 开放数据集中省略了雷达?
自动驾驶汽车最重要的技术
监督学习,深度学习
GANs
验证 Waymo 驱动程序
自动驾驶汽车的预测
为什么在 Waymo 工作?
Roboticists
computer scientists
deep-learning researchers
hardware experts and so on.
Waymo 正在招聘什么职位
什么是好的候选人
Waymonauts 获得纳米学位
为什么要攻读纳米学位
计算机视觉、传感器融合等主题的信息。
使用Python编写面向对象的代码并熟练使用numpy和matplotlib库(例如用于矩阵乘法和简单绘图)
计算常用函数的导数(基本微积分技能),例如
计算矩阵和点积的乘法(基本线性代数技能)
课程总纲
自动驾驶汽车 (SDC) 背景下的计算机视觉介绍
为什么我们需要SDC 中的摄像头?
从经典计算机视觉到深度学习
SDC中物体检测的挑战
课程工具和环境
最终课程项目
计算机视觉深度学习简介(本课)
机器学习工作流概述
线性和逻辑回归:神经网络简介
使用卷积神经网络对图像进行分类
检测图像中的对象
最终项目
Self-driving cars or autonomous vehicles will have a huge impact on our society once the technology is deployed at scale. The following articles highlight the economic impact as well as the broader consequences of the technology.
Impacts on society
Improving commute experience
Reducing traffic
Reducing number of accidents
Changing cities layouts
Reducing air pollution
insurance companies
city planners
lawmakers
daily drivers
Artificial Intelligence (AI): a system that leverages information from its environment to make decisions. For example, a video game bot.
Machine Learning (ML): an AI that does not need to be explicitly programmed, and instead learns from data. For example, a spam classification algorithm.
Deep Learning (DL): a subset of ML algorithms that do not require handcrafted features and can work with raw data. For example, an object detection algorithm with a convolutional neural network.
Supervised Learning
In this course, we will focus on supervised learning, where we use annotated data to train an algorithm. In supervised learning, we will define the following:
Input variable / XX / Observation: the input data to the algorithm. For example, in spam classification, it would be an email.
Ground truth / YY / label: the known associated label with the input data. For example, a human created label describing the email as a spam or not.
Output variable / \hat{Y} Y^/ prediction: the model prediction given the input data. For example, the model predicts the input as being spam.
computer vision for self driving cars
Self-driving cars have multiple sensors, such as cameras, radar or lidar. In this course, we will focus on the camera sensor. Using this sensor, the system will be able to perform multiple tasks critical to its autonomy, such as detecting pedestrian, lanes or traffic signs. Later in the Nanodegree, you will perform sensor fusion using camera and lidar data!
自动驾驶汽车有多个传感器,例如摄像头、雷达或激光雷达。在本课程中,我们将重点介绍相机传感器。使用该传感器,系统将能够执行对其自主性至关重要的多项任务,例如检测行人、车道或交通标志。稍后在纳米学位中,您将使用相机和激光雷达数据执行传感器融合!
When to Use Deep Learning for Computer Vision
Deep Learning algorithms are now the state of the art (SOTA) in most computer vision problems, such as image classification or object detection. Therefore, they are now in every SDC system. However, using deep learning adds additional constraints to the system because they require more computational power.
Artificial neural networks (ANN)
Artificial neural networks (ANN) or simply neural networks are the type of systems at the core of deep learning algorithms. ANN: machine learning algorithms vaguely based on human neural networks. Neurons: the basic unit of neural networks. Takes an input signal and is activated or not based on the input value and the neuron’s weights. Layer: structure containing multiple neurons. Layers are stacked to create a neural network.
In this course, we will be using the TensorFlow library to create our machine learning models. TensorFlow is one of the most popular ML libraries and is used by many companies to develop and train algorithms. TensorFlow makes it very easy for the user to deploy such algorithms on different platforms, from a smartphone device to the cloud.
注册Waymo 开放数据集
In this course, you will need the following:
Install gsutil: a Python application to manipulate Google Cloud Storage items. You will find the tutorial to install it here. Create a Github account: a version control system. You will need to create an account here. You will need a github account to access some of the material and create your submission for the final project. If you already have an account, you are good to go for this step! Set up an Integrated development environment (IDE): a software application to write code. For this course, I would recommend either Pycharm or VS Code.
For the final project of this course, you will have to train an object detection model using the TensorFlow Object Detection API. This API simplifies the training and development of object detection models in TensorFlow. You will learn how to master it in this project. This API makes the exploration of the optimal parameters for your model extremely easy by using config files. Because you should try to create the best possible model, you will have to tweak and test different parameters. Finally, you will have to perform an in-depth error analysis.
对于本课程的最终项目,您必须使用TensorFlow 对象检测 API训练对象检测模型。该 API 简化了 TensorFlow 中对象检测模型的训练和开发。您将在这个项目中学习如何掌握它。通过使用配置文件,此 API 可以非常轻松地探索模型的最佳参数。因为您应该尝试创建最好的模型,所以您必须调整和测试不同的参数。最后,您将必须执行深入的错误分析
Recap 回顾
In this lesson, we focused on the following:
Overall course outline: overview of the different lessons of this course.
Cameras and Computer Vision in SDC: we learned why the camera sensor is critical to SDC systems, and its strength and weaknesses.
From classic computer vision to Deep Learning: we learned about the history of deep learning and discovered the different components of a neural network.
Tools and Environment for the Course: we listed the different tools and software we will be using in this course.
Final Course Project: we listed the different aspects of the final project.
在本课中,我们将学习如何思考机器学习问题。机器学习 (ML) 不仅涉及酷炫的数学和建模,还涉及选择设置问题、确定客户需求和长期目标。本课将安排如下:
我们将通过识别关键利益相关者并选择正确的指标来练习构建机器学习问题。
因为机器学习是关于数据的,我们将讨论与数据相关的不同挑战。
我们还将解决在解决 ML 问题时如何组织您的数据集,以确保您创建的模型能够在新数据上表现良好。
最后,我们将看到您如何利用不同的工具来查明模型的局限性。
在本课程中,我们将多次使用德国交通标志识别基准 (GTSRB)进行练习。数据集的下采样版本已下载到您的工作区。
In the following videos and lessons, we are going to take a deeper dive into each component of the workflow.
Problem setup is the phase where we set the boundaries of the problem and will be tackled in the next few videos.
The Data part of the workflow consists in getting familiar with the available dataset and will be the main focus of the next lesson on the camera sensor.
Modeling is such a critical step that we will spend 3 lessons on it. Modeling consists in choosing and training different models and picking the best one.
detects sharks from an overview perspective
除非您参加机器学习竞赛,否则模型的性能很少是您唯一关心的事情。例如,在自动驾驶汽车系统中,模型的推理时间(提供预测所需的时间)也是一个重要因素。每秒可以消化 5 张图像的模型比每秒只能处理一张图像的模型要好,即使第二张图像表现更好。在这种情况下,推理时间也是选择我们模型的一个指标。
了解您的数据管道非常重要,因为它将推动您的模型开发。在某些情况下,获取新数据相对容易,但注释它们(例如通过关联类名)可能很昂贵。在这种情况下,您可能希望创建一个需要较少数据或可以处理未标记数据的模型。
As a Machine Learning Engineer, you will rarely be the end user of your product. Therefore, you need to pinpoint the different stakeholders of the problem you are trying to solve. Why? Because this will drive your model development.
IOU: Congratulations! This definition of IoU can be used for semantic segmentation problems where we try to classify each pixel in an image. For object detection however, we will see a more efficient definition!
IOU: Intersection over Union 联合交叉
计算IOU 方法: https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/
In many cases, you will need to gather your own data but in some, you will be able to leverage Open Source datasets, such as the Google Open Image Dataset. However, keep in mind the end goal and where your algorithm will be deployed or used.
Because of something called domain gap, an algorithm trained on a specific dataset may not perform well on another. For example, a pedestrian detection algorithm trained on data gathered with a specific camera may not be able to accurately detect pedestrians on images captured with another camera.
Sensor captures the data
Processing algorithm cleans the data
Label the data
Data is being used in a ML algorithm
机器学习算法可能对域转移非常敏感。这种领域转移可能发生在不同的层面:
天气/光照条件:例如,仅在晴天图像上训练的算法在显示雨天或夜间数据时不会表现良好。
传感器:传感器的变化或不同的处理方法将产生域转移。
环境:例如,在低强度交通数据上训练的算法在高强度交通数据上表现不佳。
广泛的探索性数据分析 (EDA)对于任何 ML 项目的成功都至关重要。为什么?因为在这个阶段,机器学习工程师熟悉数据集并发现数据的任何潜在挑战。EDA 是该项目的重要组成部分,以至于 ML 工程师单独花费了几天时间。对于视觉问题,它需要查看数据集中的 1,000 幅图像!
Machine Learning algorithms may be very sensitive to domain shift. This domain shift can happen at different levels:
weather / light conditions: for example, an algorithm trained only on sunny images is not going to perform well when shown rainy or night-time data.
sensor: a sensor change or different processing methods will create a domain shift.
environment: an algorithm trained on low intensity traffic data will not perform well on high intensity traffic data for example.
An extensive Exploratory Data Analysis (EDA) is critical to the success of any ML project. Why? Because during this phase, the ML engineer gets acquainted with the dataset and discovers any potential challenges with the data. The EDA is such an important part of the project that ML engineers spend a few days on it alone. For a vision problem, it requires looking at 1,000s of images in your dataset!
Some images are darker than other
Some images are blurrier than other
Occlusions will not be a problem in this dataset.
The goal of our ML algorithm is to be deployed in a production environment. For example, the object detection algorithm you will create in the final project could be deployed directly in a self driving car. But before we can deploy such algorithms, we need to be sure that it will perform well in any environments it will encounter. In other words, we want to evaluate the generalization ability of our model.
We are going to introduce three new concepts:
overfitting: when the model does not generalize well
bias-variance tradeoff: why is it hard to create a balanced model
cross validation: a technique to evaluate how well the model generalizes
当模型过度拟合时,它就失去了泛化能力。当所选模型过于复杂并开始提取噪声而不是有意义的特征时,通常会发生这种情况。例如,当汽车检测模型开始提取数据集中汽车的品牌特定特征(例如汽车标志)而不是更广泛的特征(车轮、形状等)时,它就会过度拟合。
过拟合提出了一个非常重要的问题。我们如何知道我们的模型是否可以正确泛化?事实上,当单个数据集可用时,要知道我们是否创建了一个过度拟合或只是表现良好的模型将具有挑战性。
现在,我们将使用术语训练数据来描述用于教授和创建算法的数据,以及针对任何新的、看不见的数据的测试数据。
The bias-variance tradeoff illustrates one the most important challenges in Machine Learning. How do we create a model that performs well while keeping its ability to generalize to new, unseen data? The performance of our algorithm on such data is quantified by the test error. The test error can be decomposed in further into the bias and the variance.
The bias quantifies the quality of the fit of our model on the training data. A low bias means that our model has a very low error rate on the training dataset.
The variance quantifies the sensitivity of the model to the training data. In other words, if we were to replace our training dataset with another one, how much would the training error rate change? A low variance means that our model is not sensitive to the training data and generalizes well.
Validation Sets & Cross Validation
Cross validation is a set of techniques to evaluate the capacity of our model to generalize and alleviate the overfitting challenges. In this course, we will leverage the validation set approach, where we split the available data into two splits:
a training set, used to create our algorithm (usually 80-90% of the available data)
a validation set used to evaluate it (10-20% of the available data)
In further videos, we will see how we can leverage this approach to alleviate the overfitting problem.
Other cross validation methods exist, such as LOO (Leave One Out) or k-fold cross validation but they are not suited to Deep Learning algorithms. You can read more about these other two techniques here.
TF Records are TensorFlow’s custom data format. Even though they are not technically required to train a model with TensorFlow, they can be very useful. For some pre-existing TensorFlow APIs, such as the object detection API that we will use for the final project, a TF Record format is required to train models.
Waymo Open Dataset vs. TensorFlow Object Detection API
In the final project of this course, you will use data from the Waymo Open Dataset with the TensorFlow Object Detection API to perform object detection on camera images. While each use .tfrecord files, there is a difference in the structure of each. As such, the upcoming exercise will have you take a .tfrecord from the Waymo Open Dataset and convert it into a new .tfrecord useable by the TensorFlow Object Detection API.
While also linked in the exercise itself, you’ll need a few resources to be able to do so more easily.
First, this repository gives some additional information around the Waymo Open Dataset itself (note that Waymo link to https://waymo.com/open/data/ therein now should be https://waymo.com/open/data/perception, as Waymo has also added a “motion” component to the previous perception-only dataset).
Secondly, this tutorial for the TF Object Detection API for converting from .xml to .tfrecord also shows certain steps that will apply in our case as well.
This exercise will require some research on your own of the above documentation (and potentially other documentation) to reach a converted file; however, if you get stuck, it is perfectly reasonable to skip ahead to the solution video for some assistance.
Additional Resources
The above documentation will be useful to refer to as you work on the upcoming exercise.
ML Engineers get very excited about creating new models. However, before diving into this step of the ML workflow, one must set realistic expectations, by setting up baselines.
A lower bound baseline gives you an idea of a minimum expected performance. If you are getting metrics below such baseline, a red flag should be raised and should be concerned that something is wrong with your training pipeline. For example, for a classification problem, the random guess baseline is a good lower bound. Given C classes, the accuracy of your accuracy of your algorithm should be higher than 1/C.
An upper bound baseline gives you a sense of the maximum expected performance. If a client comes to you and asks for an algorithm that classifies images correctly 100% of the time, you can safely let them know that it won’t happen. Human performance is a good upper bound baseline. For a classification problem, you should try to manually classify 100s of images to get an idea of what level of performance your algorithm could reach.
Model selection is a dynamic part of the ML workflow. It requires many iterations. Unless you have some prior knowledge of the task, it is recommended to start with simple models and iterate on complexity. Keep in mind that the validation set should remain the same during this phase!
Validation set metrics are a good indicator of global performances of the model but we often need a finer understanding. A metric like accuracy won’t tell you if a certain class of objects is always misclassified, for example. For these reasons, one must perform an in-depth error analysis before iterating on the model.
Sorting predictions based on the metric or loss values is always a useful way to identify error patterns.
验证集指标是模型全局性能的一个很好的指标,但我们通常需要更深入的理解。例如,像准确性这样的指标不会告诉您某类对象是否总是被错误分类。由于这些原因,必须在对模型进行迭代之前进行深入的错误分析。
根据度量或损失值对预测进行排序始终是识别错误模式的有用方法。
Congratulations! If the training dataset is missing examples of data that occurs in the validation set, you should try to increase its size (or use augmentations, as we will see in later lessons).
The Machine Learning workflow is organized as follow:
Frame the problem: understand the stakes, and define relevant metrics.
Understand the data: perform an Exploratory Data Analysis, and extract patterns from the dataset.
Iterate on the model: create a validation set, set up baselines, and iterate on models from simpler to more complex.
Learn how to calibrate you camera to remove distortions for improved perception.
This lesson will be organized as follow:
The camera sensor and its distortion effect
The camera pinhole model
Camera calibration
RGB and other color systems
Image manipulation in Python
本课将安排如下:
相机传感器及其失真效果
相机针孔模型
相机校准
RGB 和其他颜色系统
Python中的图像处理
Cameras are optical instruments capturing the light intensity on a digital image. The most important characteristics of a camera for a ML engineer are the following:
Resolution: Number of pixels the image captured by the camera is made of (usually described in mega pixels).
Aperture: size of the opening where the light enters the camera. Controls the amount of light received by the sensor.
Shutter speed: duration that the sensor is exposed to the light. Also controls the amount of light by the sensor.
Focal length / field of view: this parameter controls the angle of view of the image.
相机是捕捉数字图像上光强度的光学仪器。对于机器学习工程师来说,相机最重要的特性如下:
分辨率:由相机捕获的图像构成的像素数(通常以百万像素描述)。
光圈:光线进入相机的开口大小。控制传感器接收的光量。
快门速度:传感器暴露在光线下的持续时间。还通过传感器控制光量。
焦距/视野:此参数控制图像的视角。
Distortion
Image distortion occurs when a camera looks at 3D objects in the real world and transforms them into a 2D image; this transformation isn’t perfect. Distortion actually changes what the shape and size of these 3D objects appear to be. So, the first step in analyzing camera images, is to undo this distortion so that you can get correct and useful information out of them.
失真
当相机观察现实世界中的 3D 对象并将它们转换为 2D 图像时,就会发生图像失真;这种转变并不完美。失真实际上会改变这些 3D 对象的形状和大小。因此,分析相机图像的第一步是消除这种失真,以便您可以从中获得正确和有用的信息。
为什么校正图像失真很重要?
失真可以改变图像中对象的外观大小
失真可以改变图像对象的外观形状
失真会导致对象的外观根据它的视野中的位置而改变。
失真会使物体看起来比实际更近或更远。
Types of Distortion
Real cameras use curved lenses to form an image, and light rays often bend a little too much or too little at the edges of these lenses. This creates an effect that distorts the edges of images, so that lines or objects appear more or less curved than they actually are. This is called radial distortion, and it’s the most common type of distortion.
Another type of distortion, is tangential distortion. This occurs when a camera’s lens is not aligned perfectly parallel to the imaging plane, where the camera film or sensor is. This makes an image look tilted so that some objects appear farther away or closer than they actually are.
Distortion Coefficients and Correction
There are three coefficients needed to correct for radial distortion: k1, k2, and k3. To correct the appearance of radially distorted points in an image, one can use a correction formula.
In the following equations, (x, y) is a point in a distorted image. To undistort these points, OpenCV calculates r, which is the known distance between a point in an undistorted (corrected) image and the center of the image distortion, which is often the center of that image
. This center point
is sometimes referred to as the distortion center. These points are pictured below.
Note: The distortion coefficient k3 is required to accurately reflect major radial distortion (like in wide angle lenses). However, for minor radial distortion, which most regular camera lenses have, k3 has a value close to or equal to zero and is negligible. So, in OpenCV, you can choose to ignore this coefficient; this is why it appears at the end of the distortion values array: [k1, k2, p1, p2, k3]. In this course, we will use it in all calibration calculations so that our calculations apply to a wider variety of lenses (wider, like wide angle, haha) and can correct for both minor and major radial distortion.
Points in a distorted and undistorted (corrected) image. The point (x, y) is a single point in a distorted image and (x_corrected, y_corrected) is where that point will appear in the undistorted (corrected) image.
Radial distortion correction.
There are two more coefficients that account for tangential distortion: p1 and p2, and this distortion can be corrected using a different correction formula.
Tangential distortion correction.
Examples of Useful Code:
1.# Converting an image, imported by cv2 or the glob API, to grayscale:
2.gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
3.
4.# Finding chessboard corners (for an 8x6 board):
5.ret, corners = cv2.findChessboardCorners(gray, (8,6), None)
6.
7.# Drawing detected corners on an image:
8.img = cv2.drawChessboardCorners(img, (8,6), corners, ret)
9.
10.# Camera calibration, given object points, image points, and the shape of the grayscale image:
11.ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1], None, None)
12.
13.# Undistorting a test image:
14.dst = cv2.undistort(img, mtx, dist, None, mtx)
关于图像形状的注释
传递给calibrateCamera函数的图像的形状就是图像的高度和宽度。检索这些值的一种方法是从灰度图像形状数组中检索它们gray.shape[::-1]。这将返回像素值中的图像宽度和高度,例如 (1280, 960)。
检索图像形状的另一种方法是通过使用 检索彩色图像形状数组中的前两个值,直接从彩色图像中获取它们img.shape[1::-1]。此代码片段仅要求形状数组中的前两个值,并将它们反转。请注意,在我们的示例中,我们使用的是灰度图像,因此我们只有 2 个维度(彩色图像具有三个维度,高度、宽度和深度),因此这不是必需的。
使用整个灰度图像形状或彩色图像形状的前两个值很重要。这是因为彩色图像的整个形状将包括第三个值——颜色通道的数量——除了图像的高度和宽度。例如,彩色图像的形状数组可能是 (960, 1280, 3),它们是图像 (960, 1280) 的像素高度和宽度以及表示颜色中三个颜色通道的第三个值 (3) image 稍后您将了解更多信息,如果您尝试将这三个值传递给calibrateCamera 函数,您将收到错误消息。
Grayscale images are single channel images that only contain information about the intensity of the light.
Color models are mathematical models used to describe digital images. The Red, Green, Blue (RGB) color model describes images using three channels. Each pixel in this model is described by a triplet of values, usually 8-bit integers. This is the most common color model used in ML. HLS/HSV are also very popular color models. They take a different approach than the RGB model by encoding the color with a single value, the hue. The other two values characterize the darkness / colorfulness of the image.
灰度图像是单通道图像,仅包含有关光强度的信息。
颜色模型是用于描述数字图像的数学模型。的红,绿,蓝(RGB)颜色模型中使用三个信道描述的图像。该模型中的每个像素都由一组值描述,通常是 8 位整数。这是机器学习中最常用的颜色模型。HLS/HSV也是非常流行的颜色模型。它们采用与 RGB 模型不同的方法,将颜色编码为单个值,即色调。另外两个值表征图像的暗度/色彩。
Pillow is a python imaging library. Using Pillow, we can easily load images, convert them from one color model to another and perform diverse pixel level transformation, such as color thresholding. Color thresholding consists of isolating a range of color from a digital image. It can be done using different color models, but the HSV/HLS color models are particularly well suited for this task.
You can use the workspace below to try out the same code from the video as well as the second video further down the page.
Image Enhancement and Filtering
Images in ML dataset reflect real life conditions and therefore may need to be improved upon or modified. Pillow provides a very useful module, ImageEnhance, to perform pixel level transformations on images, such as contrast changes. Moreover, ML engineers often want to add some noise to the images in the dataset to reduce overfitting. ImageEnhance provides simple ways of doing so.
In addition to pixel level transformation, Pillow also provides ways to perform geometric transformations, such as rotation, resizing or translation. In particular, we can use Pillow to perform affine transformation (a geometric transformation where lines are preserved) using a transformation matrix.
除了像素级变换之外,Pillow 还提供了执行几何变换的方法,例如旋转、调整大小或平移。特别是,我们可以使用 Pillow使用变换矩阵执行仿射变换(保留线的几何变换)。
几何变换测试
In this lesson, we learned about:
-The camera sensor and its distortion effect. A camera captures light to a digital sensor but the raw images are distorted.
The camera pinhole model: a simplified physical model of cameras.
Camera calibration and how to use the Python library OpenCV to calibrate a camera using checkerboard images.
RGB and other color systems. We discovered the RGB, HLS and HSV color systems and learned about the strength and weaknesses of each one.
Image manipulation in Python and how to leverage Pillow to perform pixel-level and geometric transformations of digital images.
Use the Waymo dataset to detect objects in an urban environment.
Welcome to this lesson on LiDAR technology. Without LiDAR sensors, we will most probably not see fully self-driving cars become a reality.
In the first chapter, we will start with the general role of LiDAR in autonomous driving first. You will learn about the various levels of autonomous driving, get a brief introduction to camera, LiDAR and radar and we will discuss criteria you need to consider for sensor selection.
In the second chapter, we will look at the LiDAR sensors used in Waymo vehicles. We will briefly look into the most important technical specifications, discuss the structure of the Waymo Open Dataset and I will introduce you to the course starter code for many of the exercises.
In the third chapter, you will focus on LiDAR technology. You will learn about the LiDAR working principle, the LiDAR equation and the meaning of multiple signal returns. Also, you will get an overview of currently available LiDAR types and the major differences between them.
We will also look at the concept of range images used in the Waymo dataset. You will learn how range images are structured and how you can transform them into 3d point-clouds.
激光雷达在自动驾驶中的作用
自动驾驶工程师协会 (SAE) 定义的“自动驾驶级别”。
驾驶员辅助系统 (ADAS)(例如前方碰撞警告和制动或自适应巡航控制)是唯一可在某些驾驶情况下(安全或舒适)至少使单个车辆功能自动化的系统。
3 级系统的商用车辆
此类系统必须足够可靠,才能最大限度地减少错误决策。工程师通常通过在汽车上添加大量传感器来解决这个问题,这使得此类系统(非常)昂贵。
对事故引起的诉讼的恐惧导致有意降低系统的可用性,例如通过限制驾驶速度或场景(例如,仅在高速公路上以低于 60kph 的速度行驶且车道标记清晰可见的情况下)。
无法保证驾驶员随时准备好控制车辆。由于人类的反应时间和警觉水平,在许多情况下这是不可能的。
Autonomous Vehicle Sensor Types 自动驾驶传感器类型
相机
摄像头已经在许多汽车上司空见惯,并用于各种目的,例如车辆和行人检测、车道检测、道路标志识别或简单地向驾驶员展示后方区域的视图。相机属于被动传感器组,这意味着它们获取从物体反射的环境光并将其转换为二维图像。
自动驾驶汽车广泛依赖摄像头来执行许多各种功能,这远远超出了障碍物检测(例如高精度地图绘制和定位)。
然而,在交通中存在许多情况,其中从物体反射的光不足以进行稳定检测(例如在夜间光线不足的地区、大雨中、阳光直射)。在这种情况下,需要不依赖于环境条件的其他类型的传感器。
雷达
与摄像头一样,当今许多车辆已经配备了一个或多个雷达传感器。这种传感器类型最常用于驾驶员辅助系统,例如“自适应巡航控制”或“自动紧急制动”。
雷达是有源传感器,因为它们发出电磁波,该电磁波被具有某些特性的物体(例如金属)反射。根据信号的运行时间,雷达传感器可以非常准确地估计距离。此外,基于称为“多普勒效应”的物理原理,雷达可以通过评估与物体速度成正比的返回信号中的频移来测量移动物体的速度。
雷达传感器的主要优势之一是它们能够在几乎所有天气条件下可靠工作。然而,雷达的空间分辨率非常低,这就是为什么它们不适合非常准确地测量物体的尺寸。此外,金属含量很少或不含金属的物体(例如行人)不会产生强烈的回波信号,并且很难与背景噪声分开。
汽车雷达传感器通常在两个频段之一工作:24GHz 雷达传感器用于短距离应用并具有更宽的张角,而 77GHz 传感器用于窄锥体中的远程感应。
激光雷达(LiDAR)
LiDAR 也属于主动传感器组。基本原理基于发射激光束并测量光从物体反射回来并返回传感器所需的时间。目前在自动驾驶汽车中使用的最突出的 LiDAR 类型是安装在车顶的顶部设备,以 360 度弧形快速旋转,每秒生成数千个测量值。
这种旋转的 LiDAR 传感器可以创建非常精确的周围环境 3D 点图,因此可以检测车辆、行人、骑自行车者和其他障碍物。这种传感器类型也称为扫描 LiDAR,因为它需要移动其部件以逐步对视场进行光栅化(“扫描”)。
然而,当前可用的传感器类型有一个明显的缺点:它们通常要花费数千美元,具体取决于型号和功能。尽管工程师们努力降低成本,但 LiDAR 仍然是自动驾驶汽车中最昂贵(也是最笨重)的传感器。
目前,LiDAR 行业正在研究三个核心问题:
降低单位价格
减小包装尺寸
增加感应范围和分辨率
作为扫描 LiDAR 的替代方案,还有非扫描传感器,也称为Flash LiDAR。术语“闪光”指的是视场完全由激光源照亮,就像带有闪光灯的相机一样,而光电探测器阵列同时接收反射的激光脉冲。
Flash LiDAR 传感器没有任何移动部件,这就是为什么它们抗振动并且封装尺寸比扫描 LiDAR 传感器小得多。与屋顶安装的 LiDAR 类型相比,这种传感器类型的缺点是范围有限且视野相对较窄。在自动驾驶汽车中,扫描和非扫描 LiDAR 都用于观察车辆周围的不同区域:安装在车顶的扫描 LiDAR 可生成 360 度视图,直到大约 80-100m 而非扫描 LiDAR 传感器(通常安装在四个角落)在顶部安装的传感器盲区观察车辆的直接附近。
其他传感器类型
除了摄像头、雷达和 LiDAR,还有其他类型的传感器可用,例如超声波传感器(自 1990 年代以来广泛用于停车应用)或立体摄像头(有时也称为伪 LiDAR)。但是,这些传感器超出了本课程的范围。从传感器融合的角度来看,将摄像头传感器与 LiDAR 或雷达或两者结合使用是最有意义的,以获得可靠且准确的车辆周围环境重建。
在本节中,您将了解传感器选择标准以及相机、LiDAR 和雷达在每个标准方面的差异。
1. 范围:LiDAR 和雷达系统可以检测距离从几米到超过 200m 的物体。许多 LiDAR 系统难以检测距离很近的物体,而雷达可以检测不到一米的物体,具体取决于系统类型(长距离、中距离或短距离)。单色相机无法可靠地测量到物体的公制距离——这只能通过对世界的性质(例如平坦的路面)做出一些假设来实现。另一方面,立体相机可以测量距离,但最多只能测量约 100 米的距离。80m,精度从那里显着恶化。
2. 空间分辨率:由于发射的红外激光波长较短,LiDAR 扫描的空间分辨率约为 0.1°。这允许高分辨率 3D 扫描,从而表征场景中的对象。另一方面,雷达不能很好地解析小特征,尤其是随着距离的增加。相机系统的空间分辨率由光学元件、图像上的像素大小及其信噪比定义。一旦从小物体发出的光线扩散到图像传感器上的几个像素(模糊),它们的细节就会丢失。此外,当很少有环境光存在来照亮物体时,空间分辨率会降低,因为物体细节会因图像传感器的噪声水平增加而叠加。
3. 暗中的鲁棒性:雷达和激光雷达在黑暗中都具有出色的鲁棒性,因为它们都是有源传感器。虽然 LiDAR 系统的白天性能非常好,但它们在晚上的性能甚至更好,因为没有可能干扰红外激光反射检测的环境阳光。另一方面,相机在夜间的检测能力非常低,因为它们是依赖环境光的被动传感器。尽管图像传感器的夜间性能有所提高,但它们在三种传感器类型中的性能最低。
4. 雨、雪、雾中的稳健性:雷达传感器的最大优势之一是其在恶劣天气条件下的性能。它们不受雪、大雨或空气中任何其他障碍物(如雾或沙粒)的显着影响。作为一种光学系统,LiDAR 和相机容易受到恶劣天气的影响,其性能通常会随着逆境水平的增加而显着下降。
5. 物体分类:相机擅长分类物体,如车辆、行人、速度标志等。这是相机系统的主要优势之一,人工智能的最新进展更加强调了这一点。具有高密度 3D 点云的 LiDAR 扫描也允许一定程度的分类,尽管对象多样性比相机少。雷达系统不允许进行大量对象分类。
6. 感知 2D 结构:相机系统是唯一能够解释二维信息(如速度标志、车道标记或交通灯)的传感器,因为它们能够测量颜色和光强度。这是相机相对于其他传感器类型的主要优势。
7. 测量速度:雷达可以利用多普勒频移直接测量物体的速度。这是雷达传感器的主要优势之一。LiDAR 只能通过使用连续距离测量来近似速度,这使得它在这方面不太准确。相机虽然不能测量距离,但可以通过观察图像平面上物体的位移来测量碰撞时间。本课程稍后将使用此属性。
8. 系统成本:近年来,雷达系统已广泛应用于汽车行业,目前的系统非常紧凑且价格合理。单色相机也是如此,在大多数情况下,其价格远低于 100 美元。由于硬件成本增加和市场上的单位数量显着减少,立体相机更加昂贵。LiDAR 在过去几年中越来越受欢迎,尤其是在汽车行业。由于技术进步,其成本已从超过 75,000 美元降至 5,000 美元以下。许多专家预测,未来几年 LiDAR 模块的成本可能会降至 500 美元以下。
9. 包装尺寸:雷达和单色摄像头都可以很好地集成到车辆中。在某些情况下,立体摄像头体积庞大,因此很难将它们集成到挡风玻璃后面,因为它们有时会限制驾驶员的视野。LiDAR 系统存在各种尺寸。360° 扫描 LiDAR 通常安装在屋顶顶部,因此非常清晰可见。在不久的将来,行业转向更小的固态 LiDAR 系统将大大缩小 LiDAR 传感器的系统尺寸。
10. 计算要求:LiDAR 和雷达几乎不需要后端处理。虽然相机是一种经济高效且易于使用的传感器,但它们需要大量处理才能从图像中提取有用的信息,这会增加整体系统成本
As you can see from the following image, Waymo also used several cameras as well as radar sensors for front / back surveillance.
The LiDAR sensors can be categorized into two broad groups:
Perimeter LiDAR: This sensor has a vertical field of vision ranging from -90° to + 30° with a range limited to 0-20m. The range limit is imposed on the data by Waymo and is only present in the Waymo Open Dataset. It can be assumed that the actual sensor range is significantly higher. Perimeter LiDARs are located on the front and back of the Waymo driverless vehicle as well as on the left / right front corners. Interestingly, the perimeter LiDAR is sold to companies not in direct competition with Waymo under the brand name Laser Bear Honeycomb . The following image shows the 3d point-cloud generated by the front-left LiDAR sensor:
周边 LiDAR:该传感器的垂直视野范围从 -90° 到 +30°,范围限制在 0-20m。范围限制是由 Waymo 对数据强加的,并且仅存在于 Waymo 开放数据集中。可以假设实际传感器范围明显更高。周边激光雷达位于 Waymo 无人驾驶车辆的前后以及左/右前角。有趣的是,周边 LiDAR 以Laser Bear Honeycomb品牌出售给与 Waymo 没有直接竞争的公司。下图显示了左前方 LiDAR 传感器生成的 3d 点云:
360 LiDAR: The top LiDAR has a vertical field of vision ranging from -17.6° to +2.4° with a range limited to 75m in the dataset. This LiDAR rotates around its vertical axis and produces a high-resolution 3d image over the 360° circumference of the vehicle. In the course, we will work with data from this sensor type to detect vehicles. The following image shows a 3d point-cloud generated by this LiDAR sensor in birds-view perspective:
360 LiDAR:顶部 LiDAR 的垂直视野范围从 -17.6° 到 +2.4°,数据集中的范围限制为 75m。该 LiDAR 围绕其垂直轴旋转,并在车辆的 360° 圆周上生成高分辨率 3d 图像。在课程中,我们将使用来自这种传感器类型的数据来检测车辆。下图显示了该 LiDAR 传感器以鸟瞰视角生成的 3d 点云:
Two aspects are noteworthy here: (1) the distance between adjacent scanner lines increases with growing distance and (b) the area in the direct circumference of the vehicle does not contain any 3d points. Both observations can be easily explained by a look at the geometry of the sensor-vehicle setup:
这里有两个方面值得注意:(1)相邻扫描线之间的距离随着距离的增加而增加;(b)车辆直接圆周上的区域不包含任何 3d 点。通过查看传感器车辆设置的几何形状,可以轻松解释这两种观察结果:
由于激光束被车辆遮挡,在车辆正前方存在较大的感知差距(“盲点”)。此外,可以看出,由于激光二极管垂直放置的角度固定,相邻光束之间的间隙随着距离的增加而变宽。
LiDAR beam start and stop pulse
Other time-of-flight methods are radar and ultrasound. Of these three ToF techniques, LiDAR provides the highest angular resolution, because of its significantly smaller beam divergence. It thus allows a better separation of adjacent objects in a scene, as illustrated in the following figure:
LiDAR and radar beam divergence
The following figure shows a typical classification of LiDAR sensors:
Scanning LiDAR - Motorized Opto-Mechanical Scanning
Motorized optomechanical scanners are the most common type of LiDAR scanners. In 2007, the company Velodyne, a pioneer in LiDAR technology, released a 64-beam rotating line scanner, which has clearly shaped and dominated the autonomous vehicle industry in their early years. The most obvious advantages of this scanner type are its long ranging distance, the wide horizontal field-of-view and the fast scanning speed.